Emotion AI¶
Nutshell¶
In this project I build a program that classifies emotions from images of human faces, as explained on the course Modern Artificial Intelligence, lectured by Dr. Ryan Ahmed, Ph.D. MBA.
The data set I use is from https://www.kaggle.com/c/facial-keypoints-detection/overview and consists of over 20000 facial images that have been labeled with facial expression/emotion and approximately 2000 images with their keypoint annotations.
The program will train two models which will detect
- facial keypoints
- detect emotions.
Then these models are combined into one model that will provide the keypoints and the emotion as the output.
A short recap of artificial neuronal networks¶
Artificial neurons are built in a similar way as human neurons. The artificial neurons take in signals through input channels (dendrites in human neurons) and processes information through transfer functions (cell bodies) and generates an output (which would travel through the axon of a neuronal cell).
Fig. 1. Side by side view of artificial and biological neurons. Credit: Top image from Introduction to Psychology (A critical approach) Copyright © 2021 by Rose M. Spielman; Kathryn Dumper; William Jenkins; Arlene Lacombe; Marilyn Lovett; and Marion Perlmutter licensed under a Creative Commons Attribution 4.0 International License. Bottom image Chrislb, CC BY-SA 3.0 , via Wikimedia Commons
For example lets consider an artificial neuron (AN) that takes three inputs: $x_1$, $x_2$, and $x_3$. We can then express the output of the artificial neuron mathematically as $y = \phi(X_1W_1 + X_2W_2 + X_3W_3 + b)$. Here $y$ is the output and the $W$s are the weights assigned to each input signal. $b$ is a bias term added to the weighted sum of inputs. $\phi$ is the activation function.
Some common modern activation functions used in neural networks are for example ReLU, GELU and the logistic activation function. ReLU is short for Rectified linear unit function and is defined as $\phi(x) = max(0,\alpha + x'b)$. ReLU is recommended for the hidden layers, since it outputs a linear response for positive values. This helps maintain larger gradients and makes training deep networks more feasible.
The Gaussian Error Linear Unit (GELU) is a smoother version of the ReLU and is defined as $x\phi(x)$, where the $\phi(x)$ stands for Gaussian cumulative distribution function.
The logistic activation function is also called sigmoid function and is defined as $\phi(x) = \frac{1}{1+e^{-x}}$. It takes a number and sets it between 0 and 1 and thus is very helpful in output layers.
Training¶
All neural networks need to be trained with labeled data. The available data is generally devided to 80% training and 20% testing data. It is also recommended to further divide the training data into an actual training data set (e.g. 60%) and a validation data set (e.g. 20%).
Training is done by adjusting the weights of the network, by iteratively minimising the cost function using for example the gradient descent optimization algorithm. It works by calculating the gradient of the cost function and then takes a step to the negative direction until it reaches the local or global minimum.
A typical choice for a cost function is the quadratic loss, which is formulated as $f_{loss}(w,b)= \frac{1}{N}\sum^n_{i=1}(\hat y-y)$.
Gradient descent algorithm:
1. Calculate the derivative of the loss function $\frac{\delta f_{loss}}{\delta w}$
2. Pick random values for weights and substitute.
3. Calculate the step size, i.e. how much we will update our weights.
step size = learning rate * gradient $=\alpha*\frac{\delta f_{loss}}{\delta w}$
4. Update the parameters and repeat.
new weight = old weight - step size $w_{new}=w_{old}-\alpha*\frac{\delta f_{loss}}{\delta w}$
Below is an example for searching the minimum of a u-shaped funciton with gradient descent. Usually the situation is mulidimensional but the simplification is solved in a similar way.
Testing various learning rates helps undestand the importance of choosing the parameters of training.
As shown above too large learning rate can lead to missig the global minimum and/or the model does not converge as quickly. Equally problematic can be too small learning rates when the model does not learn. To solve the problems rising from too small or too large learning rates there are several approaches to adjust the learning rates dynamically.
Momentum is analogous to the balls tendency to keep rolling down hill. Momentum is used to speed up the learning when the error cost gradient is heading in the same direction for a long time, and slow down when a leveled area is reached. Momentum is controlled by a variable that is analogous to the mass of the ball rolling. A large momentum helps avoiding getting stuck in local minima, but might also push through the minima we wish to find. Thus, the parameter has to be selected carefully.
Learning rates can also be adjusted through decay, which basically reduces the learning rate by a certain amount after a fixed number of epochs. It can help solve above like situations, where too great learning rate makes the learning jump back and forth over a minimum.
Adagrad or Adam are examples of popular adaptive algorithms for optimising the gradient descent.
Network architectures¶
The artificial neurons are connected to each other to form neural networks and a plethora of different network architectures exist. To harness the power of AI, it is necessary to know which architecture serves the intended purpose best. Below are three common architectures and their applications.
Recurrent Neural Networks (RNNs) handle sequential data by maintaining a hidden state that captures information about previous elements in the sequence. Therefore they are great for contexts where the output depends on previous inputs, for example time series and natural language processing.
Generative Adversial Networks (GANs) consist of two neural networks - the Generator and the Discriminator. They sparr each other in a zero-sum game framework, where the genrator creates synthetic data that resembles real data and the discriminator evaluates whether it is rela or not. This dirves the generator to output increasingly realistic data. Obviously, this is the choice for many image generation and editing but also for anomaly detection in industiral and security contexts. GANs can model regular patterns and subsequently detect anomalies by comparing generated outputs with real inputs.
Convolutional Neural Networks (CNN) are designed to process data with a grid-like topology and are most commonly used in image analysis. They utilise convolutional layers to learn spatial hierarchies by applying filters (kernels) that slide (convolve) over the input. They usually involve pooling layers that reduce the spatial dimensions and fully connected layers that map the extracted features to outputs.
Fig. 2. Convolutional neural network. Credit: Aphex34, CC BY-SA 4.0, via Wikimedia Commons
In the Emotion AI, I will use the Residual network (ResNet), which is a Residual Neural Network. Resnet's architecture includes "skip connection" features which enables training very deep networks wihtout vanishing gradient issues. Vanishing gradient problems occurs when the gradient is back-propagated to earlier layers and the resulting gradient is very small.The skip connection feature works by passing the input of one layer to a layer further down in the network. This is also called identity mapping. The ResNet model that I use has been pretrained with the ImagNet dataset.
Fig. 3. Identity mapping. Credit: LunarLullaby, CC BY-SA 4.0, via Wikimedia Commons
Part 1. Key facial points detection¶
In this section I program the DL model with convolutional neural network and residual blocks to predict facial keypoints. The data set is from https://www.kaggle.com/c/facial-keypoints-detection/overview.
The dataset consists of input images with 15 facial key points each. The training.csv file has 7049 face images with corresponding keypoint locations. The test.csv file has face images only, and will be used to test the model. The images are strings of numbers in the shape of (2140,). That has to be transformed into the real shape of the images (96, 96). Thus we create a 1-D array of the string and reshape it to 2D array.
The model I build will have the architecture presented below. The Resblock consists of two different type of blocks: Convolution block and identity block. As seen below, both of them have an additioinal short path to add the original input to the output. For the Covolution block this includes few extra steps to shape the input to the same dimensions as the output from the longer path.
Sanity check for the data by visualising 64 randomly chosen images along with their key facial points.
Output hidden; open in https://colab.research.google.com to view.
Image augmentation¶
Here I create an additional data set where the images are changed slightly to improve the generalisation of the final AI model. The idea is to get more data and more variability in e.g. orientation, lighting conditions, or size of the image. This will reduce the likelihood of overfitting and ensuring that the model learns the meaningful "concepts" of emotion recognition. I create 4 types of augmented images:
- horisontal flipping
- randomly increasing brightness
- vertical flipping
- rotation with random angle
(8560, 31)
Data normalization and scaling¶
Normalizing the image pixel values to range 0 - 1:
# Split the data into train and test data
X_train_kp, X_test_kp, y_train_kp, y_test_kp = train_test_split(img_array, img_target, test_size=0.2, random_state=42)
(6848, 96, 96, 1)
(1712, 96, 96, 1)
(1712, 30)
(6848, 30)
Building the Residual Neural Network model for key facial points detection¶
Kernels are used to modify the input by sweeping it over the original input as shown in this animation:
Fig. 4 Performing a convolution on 6x6 input with a 3x3 kernel using stride 1x1. Credit: Michael Plotke, CC BY-SA 3.0, via Wikimedia Commons.
For example, a 2D convolution command:
X = Conv2D(filters=64, kernel_size=(7,7), strides=(2,2), kernel_initializer = glorot_uniform(seed=0))(X_input)
The above function defines the following:
- use 64 distinct filters (each one is a trainable 7×7 “weight grid”).
- use stride 2x2, i.e., the filter jumps 2 pixels at a time, effectively “skipping” every other location.
- intialise the kernels with glorot_uniform method, aka Xavier uniform initialization. This draws samples from a uniform distribution within a specific range, which will be determined from the number of input and output units.
The section below defines the model architecture using Keras.
# @title Resblock
def res_block(X, filter, stage):
"""
Implementation of the Resblock.
Arguments:
X -- input tensor
filters -- tuple/list of integers, the number of filters for each conv layer (f1, f2, f3)
stage -- integer, used to name the layers
block -- string, used to name the layers uniquely within a stage
Returns:
X -- output of the res block
"""
### 1: Convolutional block###
# Make a copy of the input
X_shortcut = X
f1, f2, f3 = filter
# ----Long (main) path-----
# Conv2d
X = Conv2D(f1, kernel_size = (1,1), strides = (1,1), name=str(stage)+'convblock'+'_conv_a', \
kernel_initializer = glorot_uniform(seed=0))(X)
# MaxPool2D
X = MaxPool2D(pool_size=(2,2))(X)
# BatchNorm,ReLU
X = BatchNormalization(axis = 3, name=str(stage)+'convblock'+'_bn_a')(X)
X = Activation('relu')(X)
# Conv2D (kernel 3x3)
X = Conv2D(f2, kernel_size = (3,3), strides = (1,1), padding = 'same', name=str(stage)+'convblock'+'_conv_b', \
kernel_initializer = glorot_uniform(seed=0))(X)
# BatchNorm, ReLU
X = BatchNormalization(axis = 3, name=str(stage)+'convblock'+'_bn_b')(X)
X = Activation('relu')(X)
#Conv2D
X = Conv2D(f3, kernel_size = (1,1), strides = (1,1), name=str(stage)+'convblock'+'_conv_c', \
kernel_initializer = glorot_uniform(seed=0))(X)
#BatchNorm, ReLU
X = BatchNormalization(axis = 3, name=str(stage)+'convblock'+'_bn_c')(X)
# ----Short path----
# Conv2D
X_shortcut = Conv2D(f3, kernel_size = (1,1), strides = (1,1), name=str(stage)+'convblock'+'_conv_short', \
kernel_initializer = glorot_uniform(seed=0))(X_shortcut)
# MaxPool2D and Batchnorm
X_shortcut = MaxPool2D(pool_size=(2,2))(X_shortcut)
X_shortcut = BatchNormalization(axis = 3, name=str(stage)+'convblock'+'_bn_short')(X_shortcut)
# ----Add Paths together----
X = Add()([X, X_shortcut])
X = Activation('relu')(X)
### 2: Identity block 1 ###
# Save the input value (shortcut path)
X_shortcut = X
block = 'iden1'
# First component: Conv2D -> BatchNorm -> ReLU
X = Conv2D(f1, (1, 1), strides=(1, 1), name=str(stage) + block + '_conv_a', \
kernel_initializer=glorot_uniform(seed=0))(X)
X = BatchNormalization(axis=3, name=str(stage) + block + '_bn_a')(X)
X = Activation('relu')(X)
# Second component: Conv2D (3x3) -> BatchNorm -> ReLU
X = Conv2D(f2, (3, 3), strides=(1, 1), padding='same', name=str(stage) + block + '_conv_b', \
kernel_initializer=glorot_uniform(seed=0))(X)
X = BatchNormalization(axis=3, name=str(stage) + block + '_bn_b')(X)
X = Activation('relu')(X)
# Third component: Conv2D (1x1) -> BatchNorm
X = Conv2D(f3, (1, 1), strides=(1, 1), name=str(stage) + block + '_conv_c', \
kernel_initializer=glorot_uniform(seed=0))(X)
X = BatchNormalization(axis=3, name=str(stage) + block + '_bn_c')(X)
# Add shortcut value to the main path
X = Add()([X, X_shortcut])
X = Activation('relu')(X)
### 3: Identity block 2 ###
# Save the input value (shortcut path)
X_shortcut = X
block = 'iden2'
# First component: Conv2D -> BatchNorm -> ReLU
X = Conv2D(f1, (1, 1), strides=(1, 1), name=str(stage) + block + '_conv_a', \
kernel_initializer=glorot_uniform(seed=0))(X)
X = BatchNormalization(axis=3, name=str(stage) + block + '_bn_a')(X)
X = Activation('relu')(X)
# Second component: Conv2D (3x3) -> BatchNorm -> ReLU
X = Conv2D(f2, (3, 3), strides=(1, 1), padding='same', name=str(stage) + block + '_conv_b', \
kernel_initializer=glorot_uniform(seed=0))(X)
X = BatchNormalization(axis=3, name=str(stage) + block + '_bn_b')(X)
X = Activation('relu')(X)
# Third component: Conv2D (1x1) -> BatchNorm
X = Conv2D(f3, (1, 1), strides=(1, 1), name=str(stage) + block + '_conv_c', \
kernel_initializer=glorot_uniform(seed=0))(X)
X = BatchNormalization(axis=3, name=str(stage) + block + '_bn_c')(X)
# Add shortcut value to the main path
X = Add()([X, X_shortcut])
X = Activation('relu')(X)
return X
Next build the final model.
# @title Final Resnet Neural Network model
input_shape = (96,96,1)
# Input tensor shape
X_input = Input(input_shape)
# Zero-padding
X = ZeroPadding2D((3,3))(X_input)
# Stage 1
X = Conv2D(filters = 64, kernel_size = (7,7), strides = (2,2), name='conv1', \
kernel_initializer = glorot_uniform(seed=0))(X)
X = BatchNormalization(axis = 3, name = 'bn_conv1')(X)
X = Activation('relu')(X)
X = MaxPooling2D((3,3), strides = (2,2))(X)
# Stage 2
X = res_block(X, filter = [64, 64, 256], stage = 'res1')
# Stage 3
X = res_block(X, filter = [128,128,512], stage = 'res2')
# We could also add more resblocks if we want
# X = res_block(X, filter= [256,256,1024], stage= 'res3')
# Average pooling
X = AveragePooling2D((2,2), name = 'avg_pool')(X)
# Flatten
X = Flatten()(X)
# Dense, ReLU, Dropout
X = Dense(4096, activation = 'relu')(X)
X = Dropout(0.2)(X)
X = Dense(2048, activation = 'relu')(X)
X = Dropout(0.1)(X)
X = Dense(30, activation = 'relu')(X)
model_1_facialKeyPoints = Model(inputs = X_input, outputs = X)
Explanations of components¶
The Zeropadding adds a border of zeros (3 pixels wide) around the input image. This will prevent information loss at the edges of convolutions.
Conv2D is the cake of the convolutional layer. It applies the filters to the input image and slides them with a set stride. This way the features are extracted from the image.
The BatchNormalisation layer normalizes the output of the convolution, making training more stable. We can say it is the smooth cream layer on our convolution cake.
The ReLU activation function introduces non-linearity to the model.
MaxpPooling2D reduces the spatial dimensions of the feature maps by taking the maximum value in a window and so downsamples the output. After the Resblock, AveragePooling2D is used similar to MaxPooling, except it calculates the average value within the window. It also reduces the size of the feature maps. Just to give an impression of the impact of pooling, if we removed the MaxPooling 2D layers from Resblocks the final model would have 256 million parameters - instead of 18 million.
Flatten converts the multi-dimensional feature maps into a single, long vector, preparing the data for the fully connected layers.
Dense creates a fully connected layer where each neuron is connected to every neuron in the previous layer. These fully connected layers will process the features exrtacted by the convolutional layers.
Dropout layers are a regularisation technique which drops a set percentage of the neurons during training by setting them to zero. This makes the model less likely to overfit, and decreases the interdependency between the neurons. Therefore we improve the performance of the network and the generalisability of the model.
The final model has a very complex structure, 18 million trainable parameters, which allows it to learn to identify emotions as good or even better than average human. However, too many parameters can lead to problems, such as overfitting and slow or nonconverging training. Optimising this many parameters is not a trivial task.
Compiling and training the model¶
I will use the Adam optimization method for the training. Adam is a computationally efficient stochastic gradient method and it combines the gradient descent with momentum and the RMSP algorithm.
As discussed earlier, the momentum speeds the training by accelerating the gradients by adding a fraction of the previous gradient to the current one. The RMSP or Root Mean Square Propagation is an adaptive learning algorithm that takes the 'exponential moving average' of the gradients. In other words, it adapts the learning rate for each parameter by keeping track of an exponentially decaying average of past squared gradients.
The algortihm will proceed as follows:
1. Calculate the gradient $g_t$
$g_t = \frac{\delta L }{\delta w_t}$
2. Update the Biased first moment estimate $m_t$
$m_t = \beta_1 m_{t-1} + (1-\beta_1)g_t$
This is similar to calculating the momentum as we keep track of the decaying average of past gradients.
3. Update the Biased Second Moment Estimate $v_t$
$v_t = \beta_2 v_{t-1} + (1-\beta_2)g_t^2$
This is similar to RMSP as we keep track of an exponentially decaying average of past squared gradients.
4. Bias correction for $m_t$ and $v_t$
Especially at the beginning of training, $m_t$ and $v_t$ are biased toward zero (because the y are initialised at zero). This is corrected by Adam like this:
$\hat m = \frac{m_t}{1-\beta_1^t}$, $\hat v = \frac{v_t}{1-\beta_2^t}$
5. Parameter update
$w_{t} = w_{t_1} - \alpha_t\frac{\hat m_t}{(v_t+\epsilon)^{1/2}}*g_t$
where,
$g_t$ = gradient of the loss with respect to the parameters at iteration $t$
$\alpha_t$ = learning rate at iteration $t$
$\beta_1, \beta_2$ = decay rates for the moment estimates
$\epsilon$ = small constant to prevent division by zero
The tensorflow tool for Adam optimization accepts several arguments as input:
learning_rate: can be a float or a scheduler that optimizes the learning rate
beta_1 = A value or constant tensor (float) that tells the exponential decay rate for the 1st moment estimates, i.e. the means of the gradients. Default = 0.9.
beta_2 = A value or constant tensor (float) that tells the exponential decay rate for the 2nd moment estimates, i.e. the uncentered variance of the squared gradients. Default 0.999.
amsgrad = True/False. Wether the AMSGrad variant of the algorithm presented in the paper On the Convergence of Adam and beyond shall be applied. Default = False.
weight_decay = If set the weight decay will be set.
Other things to consider when optimising¶
The batch size determines how many training examples are processed before the model's internal parameters are updated. Smaller batch sizes can speed up the training per epoch because the model updates more frequently. However, this can lead to less stable convergence, i.e. the training loss may fluctuate more. A small batch size can be beneficial in case the model is overfitting (the trianing loss is significantly lower than the validation loss).
A larger batch size leads to slower training per epoch and requires moe memory, but can yield more stable updates for the parameters. The model usually converges more smoothly, but might not generalise as well due to "sharp minima".
Another way to tune the parameters of optimization is to use learning rate schedulers. Why? As training progresses, the model gets closer to a good solution. Smaller learning rates allow for finer adjustments to the model's weights, helping it converge to a better minimum without overshooting (see the gradient descent examples in the beginning). I have implemented a learning rate algorithm that reduces the learning rate if the validation loss does not improve in 5 epochs.
After training, the model is saved in a .keras file. The .keras is a zip archive that contains:
- The architecture
- The weights
- The optimizer's status
# @title Compiling and training with 3 epochs
run_example = True
if run_example:
adam = tf.keras.optimizers.Adam(learning_rate = 0.0001, beta_1 = 0.9, \
beta_2 = 0.999, amsgrad = False)
model_3_facialKeyPoints = Model(inputs = X_input, outputs = X)
model_3_facialKeyPoints.compile(loss = "mean_squared_error", optimizer = adam, \
metrics = ['accuracy'])
#Save the best model with least validation loss here
checkpoint = ModelCheckpoint(filepath = "Models/FacialKeyPoints_model_16-12-2025.keras", \
verbose = 1, save_best_only = True)
history3 = model_3_facialKeyPoints.fit(X_train_kp, y_train_kp, batch_size = 32, \
epochs = 3, validation_split = 0.05, callbacks=[checkpoint])
Epoch 1/3 204/204 ━━━━━━━━━━━━━━━━━━━━ 0s 76ms/step - accuracy: 0.5066 - loss: 511.2633 Epoch 1: val_loss improved from inf to 694.32581, saving model to Models/FacialKeyPoints_model_16-12-2025.keras 204/204 ━━━━━━━━━━━━━━━━━━━━ 54s 128ms/step - accuracy: 0.5069 - loss: 509.7457 - val_accuracy: 0.5627 - val_loss: 694.3258 Epoch 2/3 202/204 ━━━━━━━━━━━━━━━━━━━━ 0s 13ms/step - accuracy: 0.6501 - loss: 27.1448 Epoch 2: val_loss improved from 694.32581 to 131.69623, saving model to Models/FacialKeyPoints_model_16-12-2025.keras 204/204 ━━━━━━━━━━━━━━━━━━━━ 4s 19ms/step - accuracy: 0.6502 - loss: 27.1052 - val_accuracy: 0.5860 - val_loss: 131.6962 Epoch 3/3 204/204 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step - accuracy: 0.6906 - loss: 18.1163 Epoch 3: val_loss improved from 131.69623 to 27.90438, saving model to Models/FacialKeyPoints_model_16-12-2025.keras 204/204 ━━━━━━━━━━━━━━━━━━━━ 4s 18ms/step - accuracy: 0.6906 - loss: 18.1128 - val_accuracy: 0.7697 - val_loss: 27.9044
Epoch 1/100 102/102 ━━━━━━━━━━━━━━━━━━━━ 0s 148ms/step - accuracy: 0.5404 - loss: 895.0652 Epoch 1: val_loss improved from inf to 84.35088, saving model to Models/FacialKeyPoints_model_16-12-2025.keras 102/102 ━━━━━━━━━━━━━━━━━━━━ 49s 223ms/step - accuracy: 0.5409 - loss: 889.3439 - val_accuracy: 0.6822 - val_loss: 84.3509 - learning_rate: 8.0000e-04 Epoch 2/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.6552 - loss: 24.0586 Epoch 2: val_loss improved from 84.35088 to 26.11082, saving model to Models/FacialKeyPoints_model_16-12-2025.keras 102/102 ━━━━━━━━━━━━━━━━━━━━ 3s 31ms/step - accuracy: 0.6558 - loss: 24.0017 - val_accuracy: 0.7784 - val_loss: 26.1108 - learning_rate: 8.0000e-04 Epoch 3/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.6976 - loss: 17.8676 Epoch 3: val_loss improved from 26.11082 to 10.68048, saving model to Models/FacialKeyPoints_model_16-12-2025.keras 102/102 ━━━━━━━━━━━━━━━━━━━━ 3s 31ms/step - accuracy: 0.6978 - loss: 17.8524 - val_accuracy: 0.8047 - val_loss: 10.6805 - learning_rate: 8.0000e-04 Epoch 4/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.7243 - loss: 14.8862 Epoch 4: val_loss improved from 10.68048 to 9.54454, saving model to Models/FacialKeyPoints_model_16-12-2025.keras 102/102 ━━━━━━━━━━━━━━━━━━━━ 3s 31ms/step - accuracy: 0.7245 - loss: 14.8785 - val_accuracy: 0.8338 - val_loss: 9.5445 - learning_rate: 8.0000e-04 Epoch 5/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.7542 - loss: 13.2277 Epoch 5: val_loss did not improve from 9.54454 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.7543 - loss: 13.2243 - val_accuracy: 0.8280 - val_loss: 10.5135 - learning_rate: 8.0000e-04 Epoch 6/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.7533 - loss: 12.6656 Epoch 6: val_loss did not improve from 9.54454 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.7535 - loss: 12.6735 - val_accuracy: 0.8163 - val_loss: 9.8618 - learning_rate: 8.0000e-04 Epoch 7/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.7619 - loss: 11.5468 Epoch 7: val_loss did not improve from 9.54454 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.7620 - loss: 11.5456 - val_accuracy: 0.8455 - val_loss: 11.2517 - learning_rate: 8.0000e-04 Epoch 8/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.7806 - loss: 10.9133 Epoch 8: val_loss did not improve from 9.54454 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.7803 - loss: 10.9111 - val_accuracy: 0.8280 - val_loss: 14.5672 - learning_rate: 8.0000e-04 Epoch 9/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.7756 - loss: 10.6394 Epoch 9: val_loss improved from 9.54454 to 7.74191, saving model to Models/FacialKeyPoints_model_16-12-2025.keras 102/102 ━━━━━━━━━━━━━━━━━━━━ 3s 31ms/step - accuracy: 0.7757 - loss: 10.6384 - val_accuracy: 0.8367 - val_loss: 7.7419 - learning_rate: 8.0000e-04 Epoch 10/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.7893 - loss: 10.3817 Epoch 10: val_loss did not improve from 7.74191 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.7893 - loss: 10.3806 - val_accuracy: 0.8513 - val_loss: 11.8877 - learning_rate: 8.0000e-04 Epoch 11/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.7864 - loss: 10.5088 Epoch 11: val_loss improved from 7.74191 to 7.10119, saving model to Models/FacialKeyPoints_model_16-12-2025.keras 102/102 ━━━━━━━━━━━━━━━━━━━━ 4s 34ms/step - accuracy: 0.7863 - loss: 10.5075 - val_accuracy: 0.8280 - val_loss: 7.1012 - learning_rate: 8.0000e-04 Epoch 12/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.7827 - loss: 10.1306 Epoch 12: val_loss improved from 7.10119 to 6.29367, saving model to Models/FacialKeyPoints_model_16-12-2025.keras 102/102 ━━━━━━━━━━━━━━━━━━━━ 3s 31ms/step - accuracy: 0.7828 - loss: 10.1247 - val_accuracy: 0.8367 - val_loss: 6.2937 - learning_rate: 8.0000e-04 Epoch 13/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.7932 - loss: 10.0304 Epoch 13: val_loss did not improve from 6.29367 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.7934 - loss: 10.0245 - val_accuracy: 0.8688 - val_loss: 6.3245 - learning_rate: 8.0000e-04 Epoch 14/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8002 - loss: 8.6506 Epoch 14: val_loss improved from 6.29367 to 5.73820, saving model to Models/FacialKeyPoints_model_16-12-2025.keras 102/102 ━━━━━━━━━━━━━━━━━━━━ 3s 31ms/step - accuracy: 0.8001 - loss: 8.6591 - val_accuracy: 0.8630 - val_loss: 5.7382 - learning_rate: 8.0000e-04 Epoch 15/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.7880 - loss: 9.1423 Epoch 15: val_loss improved from 5.73820 to 5.19746, saving model to Models/FacialKeyPoints_model_16-12-2025.keras 102/102 ━━━━━━━━━━━━━━━━━━━━ 3s 31ms/step - accuracy: 0.7883 - loss: 9.1266 - val_accuracy: 0.8367 - val_loss: 5.1975 - learning_rate: 8.0000e-04 Epoch 16/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8040 - loss: 7.7693 Epoch 16: val_loss did not improve from 5.19746 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8039 - loss: 7.7827 - val_accuracy: 0.8571 - val_loss: 7.8125 - learning_rate: 8.0000e-04 Epoch 17/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.7935 - loss: 8.1466 Epoch 17: val_loss did not improve from 5.19746 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.7937 - loss: 8.1353 - val_accuracy: 0.8601 - val_loss: 5.4262 - learning_rate: 8.0000e-04 Epoch 18/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8138 - loss: 8.5209 Epoch 18: val_loss improved from 5.19746 to 4.91526, saving model to Models/FacialKeyPoints_model_16-12-2025.keras 102/102 ━━━━━━━━━━━━━━━━━━━━ 3s 31ms/step - accuracy: 0.8137 - loss: 8.5156 - val_accuracy: 0.8659 - val_loss: 4.9153 - learning_rate: 8.0000e-04 Epoch 19/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8156 - loss: 8.0416 Epoch 19: val_loss did not improve from 4.91526 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8156 - loss: 8.0426 - val_accuracy: 0.8776 - val_loss: 5.1973 - learning_rate: 8.0000e-04 Epoch 20/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8213 - loss: 7.6001 Epoch 20: val_loss did not improve from 4.91526 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8212 - loss: 7.6102 - val_accuracy: 0.8513 - val_loss: 5.6242 - learning_rate: 8.0000e-04 Epoch 21/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8136 - loss: 8.3246 Epoch 21: val_loss improved from 4.91526 to 4.54217, saving model to Models/FacialKeyPoints_model_16-12-2025.keras 102/102 ━━━━━━━━━━━━━━━━━━━━ 3s 31ms/step - accuracy: 0.8135 - loss: 8.3226 - val_accuracy: 0.8746 - val_loss: 4.5422 - learning_rate: 8.0000e-04 Epoch 22/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8227 - loss: 7.1771 Epoch 22: val_loss improved from 4.54217 to 4.39473, saving model to Models/FacialKeyPoints_model_16-12-2025.keras 102/102 ━━━━━━━━━━━━━━━━━━━━ 3s 31ms/step - accuracy: 0.8225 - loss: 7.1885 - val_accuracy: 0.8455 - val_loss: 4.3947 - learning_rate: 8.0000e-04 Epoch 23/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8195 - loss: 7.8114 Epoch 23: val_loss did not improve from 4.39473 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8195 - loss: 7.8081 - val_accuracy: 0.8834 - val_loss: 5.3614 - learning_rate: 8.0000e-04 Epoch 24/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8143 - loss: 7.7779 Epoch 24: val_loss did not improve from 4.39473 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8144 - loss: 7.7859 - val_accuracy: 0.8776 - val_loss: 5.1739 - learning_rate: 8.0000e-04 Epoch 25/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8107 - loss: 8.0027 Epoch 25: val_loss did not improve from 4.39473 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8109 - loss: 7.9943 - val_accuracy: 0.8192 - val_loss: 4.9128 - learning_rate: 8.0000e-04 Epoch 26/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8258 - loss: 6.7683 Epoch 26: val_loss did not improve from 4.39473 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8259 - loss: 6.7793 - val_accuracy: 0.8309 - val_loss: 6.3827 - learning_rate: 8.0000e-04 Epoch 27/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - accuracy: 0.8278 - loss: 7.5027 Epoch 27: val_loss did not improve from 4.39473 Epoch 27: ReduceLROnPlateau reducing learning rate to 0.0005199999868636951. 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8278 - loss: 7.5068 - val_accuracy: 0.8601 - val_loss: 7.5741 - learning_rate: 8.0000e-04 Epoch 28/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8374 - loss: 6.5003 Epoch 28: val_loss improved from 4.39473 to 3.24508, saving model to Models/FacialKeyPoints_model_16-12-2025.keras 102/102 ━━━━━━━━━━━━━━━━━━━━ 3s 31ms/step - accuracy: 0.8373 - loss: 6.4874 - val_accuracy: 0.8659 - val_loss: 3.2451 - learning_rate: 5.2000e-04 Epoch 29/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8393 - loss: 6.6087 Epoch 29: val_loss did not improve from 3.24508 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8392 - loss: 6.6148 - val_accuracy: 0.8776 - val_loss: 5.8127 - learning_rate: 5.2000e-04 Epoch 30/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8369 - loss: 6.9225 Epoch 30: val_loss did not improve from 3.24508 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8369 - loss: 6.9109 - val_accuracy: 0.8921 - val_loss: 4.1179 - learning_rate: 5.2000e-04 Epoch 31/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8423 - loss: 5.7378 Epoch 31: val_loss did not improve from 3.24508 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8421 - loss: 5.7366 - val_accuracy: 0.8863 - val_loss: 4.2921 - learning_rate: 5.2000e-04 Epoch 32/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8381 - loss: 6.5616 Epoch 32: val_loss did not improve from 3.24508 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8382 - loss: 6.5517 - val_accuracy: 0.8892 - val_loss: 4.1760 - learning_rate: 5.2000e-04 Epoch 33/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8470 - loss: 5.9248 Epoch 33: val_loss improved from 3.24508 to 3.10968, saving model to Models/FacialKeyPoints_model_16-12-2025.keras 102/102 ━━━━━━━━━━━━━━━━━━━━ 4s 42ms/step - accuracy: 0.8468 - loss: 5.9220 - val_accuracy: 0.8921 - val_loss: 3.1097 - learning_rate: 5.2000e-04 Epoch 34/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8420 - loss: 5.5085 Epoch 34: val_loss did not improve from 3.10968 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8421 - loss: 5.5138 - val_accuracy: 0.8863 - val_loss: 4.7243 - learning_rate: 5.2000e-04 Epoch 35/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8544 - loss: 5.4292 Epoch 35: val_loss did not improve from 3.10968 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8541 - loss: 5.4260 - val_accuracy: 0.8776 - val_loss: 3.7656 - learning_rate: 5.2000e-04 Epoch 36/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8415 - loss: 5.6864 Epoch 36: val_loss did not improve from 3.10968 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8416 - loss: 5.6903 - val_accuracy: 0.8659 - val_loss: 5.2461 - learning_rate: 5.2000e-04 Epoch 37/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8418 - loss: 5.8241 Epoch 37: val_loss did not improve from 3.10968 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8420 - loss: 5.8182 - val_accuracy: 0.8805 - val_loss: 3.8141 - learning_rate: 5.2000e-04 Epoch 38/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - accuracy: 0.8442 - loss: 5.3520 Epoch 38: val_loss did not improve from 3.10968 Epoch 38: ReduceLROnPlateau reducing learning rate to 0.0003380000009201467. 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8442 - loss: 5.3551 - val_accuracy: 0.8542 - val_loss: 4.5850 - learning_rate: 5.2000e-04 Epoch 39/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8590 - loss: 5.0501 Epoch 39: val_loss improved from 3.10968 to 2.96368, saving model to Models/FacialKeyPoints_model_16-12-2025.keras 102/102 ━━━━━━━━━━━━━━━━━━━━ 5s 48ms/step - accuracy: 0.8588 - loss: 5.0505 - val_accuracy: 0.8717 - val_loss: 2.9637 - learning_rate: 3.3800e-04 Epoch 40/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8466 - loss: 4.9849 Epoch 40: val_loss did not improve from 2.96368 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8467 - loss: 4.9852 - val_accuracy: 0.8950 - val_loss: 2.9935 - learning_rate: 3.3800e-04 Epoch 41/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8538 - loss: 4.8843 Epoch 41: val_loss did not improve from 2.96368 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8539 - loss: 4.8809 - val_accuracy: 0.8863 - val_loss: 3.8110 - learning_rate: 3.3800e-04 Epoch 42/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8453 - loss: 4.7436 Epoch 42: val_loss did not improve from 2.96368 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8455 - loss: 4.7438 - val_accuracy: 0.8834 - val_loss: 2.9947 - learning_rate: 3.3800e-04 Epoch 43/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8619 - loss: 4.7259 Epoch 43: val_loss did not improve from 2.96368 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8619 - loss: 4.7253 - val_accuracy: 0.8805 - val_loss: 3.4255 - learning_rate: 3.3800e-04 Epoch 44/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8565 - loss: 5.2307 Epoch 44: val_loss did not improve from 2.96368 Epoch 44: ReduceLROnPlateau reducing learning rate to 0.00021970000816509127. 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8566 - loss: 5.2346 - val_accuracy: 0.8688 - val_loss: 3.3162 - learning_rate: 3.3800e-04 Epoch 45/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8636 - loss: 4.5413 Epoch 45: val_loss improved from 2.96368 to 2.77632, saving model to Models/FacialKeyPoints_model_16-12-2025.keras 102/102 ━━━━━━━━━━━━━━━━━━━━ 5s 52ms/step - accuracy: 0.8635 - loss: 4.5399 - val_accuracy: 0.8746 - val_loss: 2.7763 - learning_rate: 2.1970e-04 Epoch 46/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8630 - loss: 4.5546 Epoch 46: val_loss did not improve from 2.77632 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8631 - loss: 4.5525 - val_accuracy: 0.8863 - val_loss: 2.8350 - learning_rate: 2.1970e-04 Epoch 47/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8631 - loss: 4.5181 Epoch 47: val_loss did not improve from 2.77632 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8631 - loss: 4.5253 - val_accuracy: 0.9067 - val_loss: 2.8909 - learning_rate: 2.1970e-04 Epoch 48/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8623 - loss: 4.8690 Epoch 48: val_loss did not improve from 2.77632 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8623 - loss: 4.8653 - val_accuracy: 0.9009 - val_loss: 3.1521 - learning_rate: 2.1970e-04 Epoch 49/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8673 - loss: 4.3649 Epoch 49: val_loss did not improve from 2.77632 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8673 - loss: 4.3639 - val_accuracy: 0.8921 - val_loss: 3.0881 - learning_rate: 2.1970e-04 Epoch 50/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8590 - loss: 4.5642 Epoch 50: val_loss did not improve from 2.77632 Epoch 50: ReduceLROnPlateau reducing learning rate to 0.0001428050090908073. 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8591 - loss: 4.5627 - val_accuracy: 0.8863 - val_loss: 3.1290 - learning_rate: 2.1970e-04 Epoch 51/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8687 - loss: 4.3097 Epoch 51: val_loss improved from 2.77632 to 2.51120, saving model to Models/FacialKeyPoints_model_16-12-2025.keras 102/102 ━━━━━━━━━━━━━━━━━━━━ 3s 31ms/step - accuracy: 0.8687 - loss: 4.3091 - val_accuracy: 0.8776 - val_loss: 2.5112 - learning_rate: 1.4281e-04 Epoch 52/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8692 - loss: 4.0908 Epoch 52: val_loss did not improve from 2.51120 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8692 - loss: 4.0924 - val_accuracy: 0.8805 - val_loss: 2.8092 - learning_rate: 1.4281e-04 Epoch 53/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8739 - loss: 4.2070 Epoch 53: val_loss improved from 2.51120 to 2.47897, saving model to Models/FacialKeyPoints_model_16-12-2025.keras 102/102 ━━━━━━━━━━━━━━━━━━━━ 3s 31ms/step - accuracy: 0.8738 - loss: 4.2098 - val_accuracy: 0.8950 - val_loss: 2.4790 - learning_rate: 1.4281e-04 Epoch 54/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8753 - loss: 3.9069 Epoch 54: val_loss did not improve from 2.47897 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8753 - loss: 3.9092 - val_accuracy: 0.9067 - val_loss: 2.6902 - learning_rate: 1.4281e-04 Epoch 55/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8733 - loss: 4.2418 Epoch 55: val_loss did not improve from 2.47897 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8733 - loss: 4.2433 - val_accuracy: 0.8950 - val_loss: 2.9139 - learning_rate: 1.4281e-04 Epoch 56/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8633 - loss: 4.0361 Epoch 56: val_loss did not improve from 2.47897 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8634 - loss: 4.0389 - val_accuracy: 0.8776 - val_loss: 2.8956 - learning_rate: 1.4281e-04 Epoch 57/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8686 - loss: 4.2986 Epoch 57: val_loss did not improve from 2.47897 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8686 - loss: 4.2972 - val_accuracy: 0.8863 - val_loss: 3.9869 - learning_rate: 1.4281e-04 Epoch 58/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - accuracy: 0.8711 - loss: 4.1020 Epoch 58: val_loss did not improve from 2.47897 Epoch 58: ReduceLROnPlateau reducing learning rate to 9.282326063839719e-05. 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8710 - loss: 4.0995 - val_accuracy: 0.9038 - val_loss: 2.5309 - learning_rate: 1.4281e-04 Epoch 59/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8656 - loss: 4.0259 Epoch 59: val_loss did not improve from 2.47897 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8657 - loss: 4.0240 - val_accuracy: 0.8980 - val_loss: 2.5523 - learning_rate: 9.2823e-05 Epoch 60/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8812 - loss: 3.8122 Epoch 60: val_loss improved from 2.47897 to 2.46816, saving model to Models/FacialKeyPoints_model_16-12-2025.keras 102/102 ━━━━━━━━━━━━━━━━━━━━ 3s 31ms/step - accuracy: 0.8810 - loss: 3.8140 - val_accuracy: 0.8980 - val_loss: 2.4682 - learning_rate: 9.2823e-05 Epoch 61/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8706 - loss: 3.8289 Epoch 61: val_loss improved from 2.46816 to 2.39652, saving model to Models/FacialKeyPoints_model_16-12-2025.keras 102/102 ━━━━━━━━━━━━━━━━━━━━ 3s 34ms/step - accuracy: 0.8707 - loss: 3.8319 - val_accuracy: 0.9067 - val_loss: 2.3965 - learning_rate: 9.2823e-05 Epoch 62/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8731 - loss: 3.9521 Epoch 62: val_loss did not improve from 2.39652 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8730 - loss: 3.9500 - val_accuracy: 0.8950 - val_loss: 2.5407 - learning_rate: 9.2823e-05 Epoch 63/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8758 - loss: 3.9776 Epoch 63: val_loss improved from 2.39652 to 2.35033, saving model to Models/FacialKeyPoints_model_16-12-2025.keras 102/102 ━━━━━━━━━━━━━━━━━━━━ 3s 31ms/step - accuracy: 0.8758 - loss: 3.9766 - val_accuracy: 0.8892 - val_loss: 2.3503 - learning_rate: 9.2823e-05 Epoch 64/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8694 - loss: 3.9023 Epoch 64: val_loss did not improve from 2.35033 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8694 - loss: 3.9044 - val_accuracy: 0.8863 - val_loss: 2.4499 - learning_rate: 9.2823e-05 Epoch 65/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8763 - loss: 4.0924 Epoch 65: val_loss did not improve from 2.35033 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8763 - loss: 4.0896 - val_accuracy: 0.9096 - val_loss: 2.5051 - learning_rate: 9.2823e-05 Epoch 66/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8790 - loss: 3.7645 Epoch 66: val_loss did not improve from 2.35033 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8791 - loss: 3.7633 - val_accuracy: 0.8980 - val_loss: 2.4585 - learning_rate: 9.2823e-05 Epoch 67/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8746 - loss: 3.8432 Epoch 67: val_loss did not improve from 2.35033 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8747 - loss: 3.8459 - val_accuracy: 0.8892 - val_loss: 2.5714 - learning_rate: 9.2823e-05 Epoch 68/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8815 - loss: 4.0808 Epoch 68: val_loss did not improve from 2.35033 Epoch 68: ReduceLROnPlateau reducing learning rate to 6.033512036083267e-05. 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8814 - loss: 4.0770 - val_accuracy: 0.9009 - val_loss: 2.6651 - learning_rate: 9.2823e-05 Epoch 69/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8842 - loss: 3.6016 Epoch 69: val_loss did not improve from 2.35033 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8841 - loss: 3.6034 - val_accuracy: 0.8921 - val_loss: 2.4923 - learning_rate: 6.0335e-05 Epoch 70/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8763 - loss: 3.6564 Epoch 70: val_loss did not improve from 2.35033 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8763 - loss: 3.6570 - val_accuracy: 0.9038 - val_loss: 2.4128 - learning_rate: 6.0335e-05 Epoch 71/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8832 - loss: 3.6022 Epoch 71: val_loss did not improve from 2.35033 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8832 - loss: 3.6028 - val_accuracy: 0.8805 - val_loss: 2.4937 - learning_rate: 6.0335e-05 Epoch 72/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8820 - loss: 3.6392 Epoch 72: val_loss did not improve from 2.35033 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8819 - loss: 3.6396 - val_accuracy: 0.9038 - val_loss: 2.7778 - learning_rate: 6.0335e-05 Epoch 73/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8769 - loss: 3.6124 Epoch 73: val_loss did not improve from 2.35033 Epoch 73: ReduceLROnPlateau reducing learning rate to 3.921782918041572e-05. 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8768 - loss: 3.6138 - val_accuracy: 0.8921 - val_loss: 2.4349 - learning_rate: 6.0335e-05 Epoch 74/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 18ms/step - accuracy: 0.8788 - loss: 3.5607 Epoch 74: val_loss did not improve from 2.35033 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8787 - loss: 3.5615 - val_accuracy: 0.8834 - val_loss: 2.4719 - learning_rate: 3.9218e-05 Epoch 75/100 100/102 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - accuracy: 0.8758 - loss: 3.5777 Epoch 75: val_loss did not improve from 2.35033 102/102 ━━━━━━━━━━━━━━━━━━━━ 2s 19ms/step - accuracy: 0.8759 - loss: 3.5763 - val_accuracy: 0.8921 - val_loss: 2.6243 - learning_rate: 3.9218e-05 Epoch 75: early stopping Restoring model weights from the end of the best epoch: 63.
Assessing the trained key facial points detection model performance¶
# Evaluate the model
# The model from materials has loss: 8.3705 accuracy: 0.85280377 with the X_test,y_test set.
result = model_1_facialKeyPoints.evaluate(X_test_kp, y_test_kp)
54/54 ━━━━━━━━━━━━━━━━━━━━ 11s 78ms/step - accuracy: 0.8771 - loss: 2.5321
54/54 ━━━━━━━━━━━━━━━━━━━━ 4s 40ms/step
# @title Printing out samples of predictions
fig, axes = plt.subplots(4,4, figsize=(10,10))
axes = axes.ravel()
out_path = "docs/pics/kp_train_pred_grid.png"
for i in range(16):
axes[i].imshow(X_test_kp[i].reshape(96,96), cmap='gray')
axes[i].axis('off')
for j in range(1,31,2):
axes[i].plot(predicted_kp.iloc[i,j-1],predicted_kp.iloc[i,j], marker='.', color=kp_color)
fig.tight_layout()
fig.savefig(out_path, dpi=200, bbox_inches="tight", transparent=True)
plt.close(fig)
display(Image(filename=out_path,width=600))
Part 2. Facial Expression detection¶
In this second part of the project, I train the second model which will classify emotions. The data contains images that belong to 5 categories:
- 0 = Angry
- 1 = Disgust
- 2 = Sad
- 3 = Happy
- 4 = Surprise
The images in the data set are of size 48px * 48px. Therefore they need to be resized so that we can run the Expression detection model with the Key facial point detection model together.
Below is an example of an original image, results from resizing and final image after interpolation.
Visualising the images in the dataset with the emotions¶
Below is the counts of each emotion category. Our data is extremely unbalanced with very few images portraying disgust and many images within category happy.
Data preparation and image augmentation¶
X shape (24568, 96, 96, 1) y shape (24568, 5) X train shape (22111, 96, 96, 1) y train shape (22111, 5) X val shape (1228, 96, 96, 1) y val shape (1228, 5) X test shape (1229, 96, 96, 1) y test shape (1229, 5)
Data preprocessing¶
In the data preprocessing I will again normalize the data and perform image augmentation, as was done in the Part 1. of the project.
First, I normalize the data to conatin values between 0 and 1. Then, I use the following image augmentation techniques:
- rotating up to 15 degrees
- shifting the image horisontally up to 0.1*image width
- shifting the image vertically up to 0.1*image height
- shearing the image up to 0.1
- zooming the image up to 10 %
- horisontally flipping the image
- vertically flipping the image
- Adjusting the brightness
The spaces outside the boundaries are filled by replicting the nearest pixels.
Build and train Deep Learning model for facial expression classification¶
The model I will build has the following architecture:
# @title Emotion recognition model
input_shape = (96,96,1)
# Input tensor shape
X_input = Input(input_shape)
# Zero-padding
X = ZeroPadding2D((3,3))(X_input)
# Stage 1
X = Conv2D(64, (7,7), strides = (2,2), name = 'conv1', kernel_initializer=glorot_uniform(seed=0))(X)
X = BatchNormalization(axis = 3, name = 'bn1')(X)
X = Activation('relu')(X)
X = MaxPooling2D((3,3), strides = (2,2))(X)
# Stage 2
X = res_block(X, filter = [64,64,256], stage = 'res2')
# Stage 3
X = res_block(X, filter = [128,128,512], stage = 'res3')
# Stage 4 (optional)
#X = res_block(X, filter= [256,256,1024], stage = 'res4')
# Average pooling
X = AveragePooling2D((4,4), name = 'avg_pool')(X)
# Final layer
X = Flatten()(X)
X = Dense(5, activation = 'softmax', name = 'dense', kernel_initializer=glorot_uniform(seed=0))(X)
Emotion_det_model_2 = Model(inputs = X_input, outputs = X, name = 'Resnet18')
Epoch 1/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 98ms/step - accuracy: 0.3861 - loss: 1.4138 Epoch 1: val_loss improved from inf to 1.41906, saving model to Models/Emotion_det_model_16-12-2025.keras 346/346 ━━━━━━━━━━━━━━━━━━━━ 60s 111ms/step - accuracy: 0.3861 - loss: 1.4138 - val_accuracy: 0.3510 - val_loss: 1.4191 - learning_rate: 1.0000e-04 Epoch 2/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 73ms/step - accuracy: 0.4018 - loss: 1.3722 Epoch 2: val_loss did not improve from 1.41906 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.4018 - loss: 1.3722 - val_accuracy: 0.3420 - val_loss: 1.5051 - learning_rate: 1.0000e-04 Epoch 3/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 73ms/step - accuracy: 0.4280 - loss: 1.3311 Epoch 3: val_loss did not improve from 1.41906 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.4280 - loss: 1.3311 - val_accuracy: 0.3428 - val_loss: 1.4884 - learning_rate: 1.0000e-04 Epoch 4/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 73ms/step - accuracy: 0.4427 - loss: 1.3050 Epoch 4: val_loss did not improve from 1.41906 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.4427 - loss: 1.3050 - val_accuracy: 0.4560 - val_loss: 1.4508 - learning_rate: 1.0000e-04 Epoch 5/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.4537 - loss: 1.2874 Epoch 5: val_loss did not improve from 1.41906 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.4537 - loss: 1.2873 - val_accuracy: 0.3648 - val_loss: 1.5678 - learning_rate: 1.0000e-04 Epoch 6/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.4715 - loss: 1.2507 Epoch 6: val_loss improved from 1.41906 to 1.27781, saving model to Models/Emotion_det_model_16-12-2025.keras 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.4715 - loss: 1.2507 - val_accuracy: 0.4951 - val_loss: 1.2778 - learning_rate: 1.0000e-04 Epoch 7/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.4722 - loss: 1.2431 Epoch 7: val_loss did not improve from 1.27781 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 72ms/step - accuracy: 0.4723 - loss: 1.2431 - val_accuracy: 0.4707 - val_loss: 1.3927 - learning_rate: 1.0000e-04 Epoch 8/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.4861 - loss: 1.2247 Epoch 8: val_loss did not improve from 1.27781 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.4861 - loss: 1.2247 - val_accuracy: 0.4145 - val_loss: 1.3872 - learning_rate: 1.0000e-04 Epoch 9/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 73ms/step - accuracy: 0.4894 - loss: 1.2070 Epoch 9: val_loss improved from 1.27781 to 1.07033, saving model to Models/Emotion_det_model_16-12-2025.keras 346/346 ━━━━━━━━━━━━━━━━━━━━ 26s 75ms/step - accuracy: 0.4894 - loss: 1.2070 - val_accuracy: 0.5871 - val_loss: 1.0703 - learning_rate: 1.0000e-04 Epoch 10/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5040 - loss: 1.1821 Epoch 10: val_loss improved from 1.07033 to 0.97317, saving model to Models/Emotion_det_model_16-12-2025.keras 346/346 ━━━━━━━━━━━━━━━━━━━━ 26s 74ms/step - accuracy: 0.5040 - loss: 1.1821 - val_accuracy: 0.6091 - val_loss: 0.9732 - learning_rate: 1.0000e-04 Epoch 11/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5164 - loss: 1.1603 Epoch 11: val_loss did not improve from 0.97317 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 72ms/step - accuracy: 0.5164 - loss: 1.1603 - val_accuracy: 0.5904 - val_loss: 1.0113 - learning_rate: 1.0000e-04 Epoch 12/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5150 - loss: 1.1665 Epoch 12: val_loss did not improve from 0.97317 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5150 - loss: 1.1665 - val_accuracy: 0.5904 - val_loss: 1.0509 - learning_rate: 1.0000e-04 Epoch 13/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5153 - loss: 1.1494 Epoch 13: val_loss improved from 0.97317 to 0.94859, saving model to Models/Emotion_det_model_16-12-2025.keras 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5153 - loss: 1.1494 - val_accuracy: 0.6336 - val_loss: 0.9486 - learning_rate: 1.0000e-04 Epoch 14/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 71ms/step - accuracy: 0.5239 - loss: 1.1388 Epoch 14: val_loss did not improve from 0.94859 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 71ms/step - accuracy: 0.5239 - loss: 1.1388 - val_accuracy: 0.5513 - val_loss: 1.0847 - learning_rate: 1.0000e-04 Epoch 15/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5191 - loss: 1.1377 Epoch 15: val_loss did not improve from 0.94859 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 72ms/step - accuracy: 0.5192 - loss: 1.1377 - val_accuracy: 0.6344 - val_loss: 0.9714 - learning_rate: 1.0000e-04 Epoch 16/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 73ms/step - accuracy: 0.5317 - loss: 1.1307 Epoch 16: val_loss did not improve from 0.94859 346/346 ━━━━━━━━━━━━━━━━━━━━ 26s 74ms/step - accuracy: 0.5317 - loss: 1.1307 - val_accuracy: 0.5912 - val_loss: 1.0424 - learning_rate: 1.0000e-04 Epoch 17/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5284 - loss: 1.1242 Epoch 17: val_loss did not improve from 0.94859 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5284 - loss: 1.1242 - val_accuracy: 0.6067 - val_loss: 1.0344 - learning_rate: 1.0000e-04 Epoch 18/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5414 - loss: 1.1065 Epoch 18: val_loss improved from 0.94859 to 0.89885, saving model to Models/Emotion_det_model_16-12-2025.keras 346/346 ━━━━━━━━━━━━━━━━━━━━ 26s 74ms/step - accuracy: 0.5414 - loss: 1.1065 - val_accuracy: 0.6466 - val_loss: 0.8989 - learning_rate: 1.0000e-04 Epoch 19/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 73ms/step - accuracy: 0.5354 - loss: 1.1080 Epoch 19: val_loss did not improve from 0.89885 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5354 - loss: 1.1081 - val_accuracy: 0.5415 - val_loss: 1.1652 - learning_rate: 1.0000e-04 Epoch 20/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 71ms/step - accuracy: 0.5408 - loss: 1.0957 Epoch 20: val_loss did not improve from 0.89885 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 72ms/step - accuracy: 0.5408 - loss: 1.0958 - val_accuracy: 0.6523 - val_loss: 0.9121 - learning_rate: 1.0000e-04 Epoch 21/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5369 - loss: 1.0992 Epoch 21: val_loss did not improve from 0.89885 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5369 - loss: 1.0992 - val_accuracy: 0.6230 - val_loss: 0.9248 - learning_rate: 1.0000e-04 Epoch 22/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 71ms/step - accuracy: 0.5462 - loss: 1.0941 Epoch 22: val_loss improved from 0.89885 to 0.83682, saving model to Models/Emotion_det_model_16-12-2025.keras 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5462 - loss: 1.0941 - val_accuracy: 0.6775 - val_loss: 0.8368 - learning_rate: 1.0000e-04 Epoch 23/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 73ms/step - accuracy: 0.5493 - loss: 1.0809 Epoch 23: val_loss did not improve from 0.83682 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5493 - loss: 1.0809 - val_accuracy: 0.6311 - val_loss: 1.0064 - learning_rate: 1.0000e-04 Epoch 24/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 71ms/step - accuracy: 0.5507 - loss: 1.0862 Epoch 24: val_loss did not improve from 0.83682 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 72ms/step - accuracy: 0.5507 - loss: 1.0862 - val_accuracy: 0.6588 - val_loss: 0.8818 - learning_rate: 1.0000e-04 Epoch 25/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5484 - loss: 1.0819 Epoch 25: val_loss did not improve from 0.83682 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 72ms/step - accuracy: 0.5484 - loss: 1.0819 - val_accuracy: 0.6686 - val_loss: 0.8533 - learning_rate: 1.0000e-04 Epoch 26/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 73ms/step - accuracy: 0.5452 - loss: 1.0779 Epoch 26: val_loss did not improve from 0.83682 346/346 ━━━━━━━━━━━━━━━━━━━━ 26s 74ms/step - accuracy: 0.5452 - loss: 1.0779 - val_accuracy: 0.6564 - val_loss: 0.9244 - learning_rate: 1.0000e-04 Epoch 27/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5593 - loss: 1.0614 Epoch 27: val_loss did not improve from 0.83682 Epoch 27: ReduceLROnPlateau reducing learning rate to 6.499999835796189e-05. 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 72ms/step - accuracy: 0.5593 - loss: 1.0614 - val_accuracy: 0.6034 - val_loss: 1.0909 - learning_rate: 1.0000e-04 Epoch 28/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5602 - loss: 1.0567 Epoch 28: val_loss did not improve from 0.83682 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 72ms/step - accuracy: 0.5602 - loss: 1.0567 - val_accuracy: 0.6531 - val_loss: 0.8728 - learning_rate: 6.5000e-05 Epoch 29/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 71ms/step - accuracy: 0.5572 - loss: 1.0452 Epoch 29: val_loss improved from 0.83682 to 0.79482, saving model to Models/Emotion_det_model_16-12-2025.keras 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5572 - loss: 1.0452 - val_accuracy: 0.7052 - val_loss: 0.7948 - learning_rate: 6.5000e-05 Epoch 30/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5607 - loss: 1.0446 Epoch 30: val_loss did not improve from 0.79482 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 72ms/step - accuracy: 0.5607 - loss: 1.0446 - val_accuracy: 0.6450 - val_loss: 0.9357 - learning_rate: 6.5000e-05 Epoch 31/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5652 - loss: 1.0389 Epoch 31: val_loss improved from 0.79482 to 0.74947, saving model to Models/Emotion_det_model_16-12-2025.keras 346/346 ━━━━━━━━━━━━━━━━━━━━ 26s 74ms/step - accuracy: 0.5652 - loss: 1.0389 - val_accuracy: 0.7166 - val_loss: 0.7495 - learning_rate: 6.5000e-05 Epoch 32/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5692 - loss: 1.0364 Epoch 32: val_loss did not improve from 0.74947 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5692 - loss: 1.0364 - val_accuracy: 0.7044 - val_loss: 0.7778 - learning_rate: 6.5000e-05 Epoch 33/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5628 - loss: 1.0403 Epoch 33: val_loss did not improve from 0.74947 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 72ms/step - accuracy: 0.5628 - loss: 1.0403 - val_accuracy: 0.6694 - val_loss: 0.8482 - learning_rate: 6.5000e-05 Epoch 34/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5724 - loss: 1.0341 Epoch 34: val_loss did not improve from 0.74947 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 72ms/step - accuracy: 0.5724 - loss: 1.0342 - val_accuracy: 0.7191 - val_loss: 0.7568 - learning_rate: 6.5000e-05 Epoch 35/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5729 - loss: 1.0349 Epoch 35: val_loss did not improve from 0.74947 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5729 - loss: 1.0349 - val_accuracy: 0.6987 - val_loss: 0.7701 - learning_rate: 6.5000e-05 Epoch 36/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 71ms/step - accuracy: 0.5737 - loss: 1.0287 Epoch 36: val_loss improved from 0.74947 to 0.73751, saving model to Models/Emotion_det_model_16-12-2025.keras 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5737 - loss: 1.0287 - val_accuracy: 0.7215 - val_loss: 0.7375 - learning_rate: 6.5000e-05 Epoch 37/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 73ms/step - accuracy: 0.5734 - loss: 1.0301 Epoch 37: val_loss improved from 0.73751 to 0.72995, saving model to Models/Emotion_det_model_16-12-2025.keras 346/346 ━━━━━━━━━━━━━━━━━━━━ 26s 74ms/step - accuracy: 0.5734 - loss: 1.0301 - val_accuracy: 0.7337 - val_loss: 0.7299 - learning_rate: 6.5000e-05 Epoch 38/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 73ms/step - accuracy: 0.5726 - loss: 1.0256 Epoch 38: val_loss did not improve from 0.72995 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5726 - loss: 1.0256 - val_accuracy: 0.7305 - val_loss: 0.7316 - learning_rate: 6.5000e-05 Epoch 39/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5706 - loss: 1.0234 Epoch 39: val_loss did not improve from 0.72995 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5706 - loss: 1.0233 - val_accuracy: 0.6987 - val_loss: 0.8048 - learning_rate: 6.5000e-05 Epoch 40/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5739 - loss: 1.0210 Epoch 40: val_loss did not improve from 0.72995 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 72ms/step - accuracy: 0.5739 - loss: 1.0210 - val_accuracy: 0.7215 - val_loss: 0.7553 - learning_rate: 6.5000e-05 Epoch 41/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 73ms/step - accuracy: 0.5833 - loss: 1.0068 Epoch 41: val_loss did not improve from 0.72995 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 74ms/step - accuracy: 0.5833 - loss: 1.0069 - val_accuracy: 0.7101 - val_loss: 0.8168 - learning_rate: 6.5000e-05 Epoch 42/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5816 - loss: 1.0073 Epoch 42: val_loss did not improve from 0.72995 Epoch 42: ReduceLROnPlateau reducing learning rate to 4.2250000115018337e-05. 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 72ms/step - accuracy: 0.5816 - loss: 1.0074 - val_accuracy: 0.6995 - val_loss: 0.7831 - learning_rate: 6.5000e-05 Epoch 43/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 71ms/step - accuracy: 0.5826 - loss: 1.0035 Epoch 43: val_loss did not improve from 0.72995 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 72ms/step - accuracy: 0.5825 - loss: 1.0035 - val_accuracy: 0.6963 - val_loss: 0.8011 - learning_rate: 4.2250e-05 Epoch 44/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5759 - loss: 1.0042 Epoch 44: val_loss improved from 0.72995 to 0.70902, saving model to Models/Emotion_det_model_16-12-2025.keras 346/346 ━━━━━━━━━━━━━━━━━━━━ 26s 74ms/step - accuracy: 0.5759 - loss: 1.0042 - val_accuracy: 0.7378 - val_loss: 0.7090 - learning_rate: 4.2250e-05 Epoch 45/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5769 - loss: 1.0099 Epoch 45: val_loss did not improve from 0.70902 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5769 - loss: 1.0099 - val_accuracy: 0.7280 - val_loss: 0.7477 - learning_rate: 4.2250e-05 Epoch 46/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5754 - loss: 1.0030 Epoch 46: val_loss did not improve from 0.70902 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5754 - loss: 1.0030 - val_accuracy: 0.7321 - val_loss: 0.7268 - learning_rate: 4.2250e-05 Epoch 47/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 73ms/step - accuracy: 0.5818 - loss: 1.0055 Epoch 47: val_loss did not improve from 0.70902 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5818 - loss: 1.0055 - val_accuracy: 0.7248 - val_loss: 0.7487 - learning_rate: 4.2250e-05 Epoch 48/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5810 - loss: 0.9977 Epoch 48: val_loss did not improve from 0.70902 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 72ms/step - accuracy: 0.5810 - loss: 0.9977 - val_accuracy: 0.7264 - val_loss: 0.7410 - learning_rate: 4.2250e-05 Epoch 49/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 73ms/step - accuracy: 0.5873 - loss: 0.9896 Epoch 49: val_loss did not improve from 0.70902 Epoch 49: ReduceLROnPlateau reducing learning rate to 2.746250102063641e-05. 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5873 - loss: 0.9896 - val_accuracy: 0.7565 - val_loss: 0.7099 - learning_rate: 4.2250e-05 Epoch 50/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5859 - loss: 0.9827 Epoch 50: val_loss did not improve from 0.70902 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5859 - loss: 0.9827 - val_accuracy: 0.7508 - val_loss: 0.7094 - learning_rate: 2.7463e-05 Epoch 51/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 73ms/step - accuracy: 0.5882 - loss: 0.9861 Epoch 51: val_loss did not improve from 0.70902 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5881 - loss: 0.9861 - val_accuracy: 0.7288 - val_loss: 0.7217 - learning_rate: 2.7463e-05 Epoch 52/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5864 - loss: 0.9916 Epoch 52: val_loss did not improve from 0.70902 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 72ms/step - accuracy: 0.5864 - loss: 0.9916 - val_accuracy: 0.7280 - val_loss: 0.7329 - learning_rate: 2.7463e-05 Epoch 53/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 73ms/step - accuracy: 0.5932 - loss: 0.9836 Epoch 53: val_loss did not improve from 0.70902 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5931 - loss: 0.9836 - val_accuracy: 0.7435 - val_loss: 0.7125 - learning_rate: 2.7463e-05 Epoch 54/100 346/346 ━━━━━━━━━━━━━━━━━━━━ 0s 72ms/step - accuracy: 0.5933 - loss: 0.9751 Epoch 54: val_loss did not improve from 0.70902 Epoch 54: ReduceLROnPlateau reducing learning rate to 1.785062613635091e-05. 346/346 ━━━━━━━━━━━━━━━━━━━━ 25s 73ms/step - accuracy: 0.5933 - loss: 0.9751 - val_accuracy: 0.7272 - val_loss: 0.7548 - learning_rate: 2.7463e-05 Epoch 54: early stopping
Training samples: 22111 Batch size: 64 Steps per epoch: 346
Evaluate model¶
Confusion matrix, accuracy, precision, and recall
39/39 ━━━━━━━━━━━━━━━━━━━━ 4s 35ms/step - accuracy: 0.7671 - loss: 0.6242
39/39 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step
Classification report for emotion detection model:
precision recall f1-score support
0 0.64 0.64 0.64 245
1 0.89 0.36 0.52 22
2 0.66 0.76 0.70 319
3 0.87 0.81 0.84 458
4 0.85 0.83 0.84 185
accuracy 0.76 1229
macro avg 0.78 0.68 0.71 1229
weighted avg 0.77 0.76 0.76 1229
The above table tells us that the classes where we had the least data (# support) have the weakest performance. Precision (percentage of samples predicted to be class x that are actually x) and recall (percentage of x samples in data that are correctly labeled as x) are highest in class 3 where we also had the most samples. f1 -score is the harmonic mean of precision and recall and it is calculated as
$F_1 = \frac{\text{precision} \ \times \ \text{recall}}{\text{precision} \ +\ \text{recall}}$
Part 3. Combining the key point detection and facial expression recognition models¶
39/39 ━━━━━━━━━━━━━━━━━━━━ 4s 93ms/step 39/39 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step
| left_eye_center_x | left_eye_center_y | right_eye_center_x | right_eye_center_y | left_eye_inner_corner_x | left_eye_inner_corner_y | left_eye_outer_corner_x | left_eye_outer_corner_y | right_eye_inner_corner_x | right_eye_inner_corner_y | ... | nose_tip_y | mouth_left_corner_x | mouth_left_corner_y | mouth_right_corner_x | mouth_right_corner_y | mouth_center_top_lip_x | mouth_center_top_lip_y | mouth_center_bottom_lip_x | mouth_center_bottom_lip_y | emotion | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 64.648270 | 39.462273 | 35.029705 | 27.303909 | 57.424828 | 38.391609 | 71.964722 | 43.019527 | 41.132412 | 31.116213 | ... | 52.116562 | 51.100479 | 69.342575 | 22.495335 | 57.190903 | 35.552544 | 63.602200 | 34.299168 | 67.525764 | 3 |
| 1 | 67.049782 | 37.382389 | 30.436636 | 31.989122 | 59.316116 | 38.093258 | 74.362892 | 38.955746 | 37.369408 | 34.813652 | ... | 60.451092 | 57.179199 | 80.493698 | 28.356564 | 76.011620 | 42.261841 | 78.572998 | 41.904636 | 82.303894 | 2 |
| 2 | 64.116463 | 36.904625 | 34.684856 | 35.691677 | 57.599121 | 37.935898 | 70.917572 | 37.860672 | 40.516914 | 37.410923 | ... | 59.986973 | 58.181126 | 76.441887 | 33.698921 | 75.366005 | 46.023739 | 74.968422 | 45.919083 | 78.991425 | 2 |
| 3 | 63.984192 | 36.043690 | 27.817284 | 38.471413 | 54.756351 | 37.784523 | 74.124146 | 36.159767 | 35.542854 | 38.942562 | ... | 52.165367 | 71.062103 | 59.365967 | 28.763102 | 60.063107 | 46.585697 | 60.998100 | 47.175484 | 65.630623 | 3 |
| 4 | 63.373016 | 40.186924 | 29.947830 | 38.409580 | 56.042046 | 41.237423 | 70.940956 | 41.500175 | 37.221470 | 40.498299 | ... | 60.592480 | 60.439510 | 77.742844 | 29.692043 | 76.036491 | 45.099701 | 77.343796 | 45.182957 | 79.698212 | 0 |
5 rows × 31 columns
Plotting test images of the combined models.
[main 8559732] updated figure sizes 4 files changed, 270 insertions(+), 600 deletions(-) rewrite docs/pics/emotion_model_training_history2.png (98%)
Enumerating objects: 15, done. Counting objects: 6% (1/15) Counting objects: 13% (2/15) Counting objects: 20% (3/15) Counting objects: 26% (4/15) Counting objects: 33% (5/15) Counting objects: 40% (6/15) Counting objects: 46% (7/15) Counting objects: 53% (8/15) Counting objects: 60% (9/15) Counting objects: 66% (10/15) Counting objects: 73% (11/15) Counting objects: 80% (12/15) Counting objects: 86% (13/15) Counting objects: 93% (14/15) Counting objects: 100% (15/15) Counting objects: 100% (15/15), done. Delta compression using up to 12 threads Compressing objects: 100% (8/8), done. Writing objects: 100% (8/8), 319.02 KiB | 928.00 KiB/s, done. Total 8 (delta 6), reused 0 (delta 0), pack-reused 0 remote: Resolving deltas: 100% (6/6), completed with 6 local objects. To https://github.com/KaisuH/Emotion-AI.git a2fd070..8559732 main -> main